eXtended Block Cache

نویسندگان

  • Stéphan Jourdan
  • Lihu Rappoport
  • Yoav Almog
  • Mattan Erez
  • Adi Yoaz
  • Ronny Ronen
چکیده

This paper describes a new instruction-supply mechanism, called the eXtended Block Cache (XBC). The goal of the XBC is to improve on the Trace Cache (TC) hit rate, while providing the same bandwidth. The improved hit rate is achieved by having the XBC a nearly redundant free structure. The basic unit recorded in the XBC is the extended block (XB), which is a multiple-entry single-exit instruction block. A XB is a sequence of instructions ending on a conditional or an indirect branch. Unconditional direct jumps do not end a XB. In order to enable multiple entry points per XB, the XB index is derived from the IP of its ending instruction. Instructions within the XB are recorded in reverse order, enabling easy extension of XBs. The multiple entry-points remove most of the redundancy. Since there is at most one conditional branch per XB, we can fetch up to n XBs per cycle by predicting n branches. The multiple fetch enables the XBC to match the TC bandwidth.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cache Design for Eliminating the Address Translation Bottleneck and Reducing the Tag Area Cost

For the physical caches, the address translation delay can be partially masked, but it is hard to avoid completely. In this paper, we propose a cache partition architecture, called paged cache, which can not only mask the address translation delay completely but also reduce the tag area dramatically. In the paged cache, we divide the entire cache into a set of partitions, and each partition is ...

متن کامل

Switch MSHR: A Technique to Reduce Remote Read Memory Access Time in CC-NUMA Multiprocessors

A remote memory access poses a severe problem for the design of CC-NUMA multiprocessors because it takes an order of magnitude longer than the local memory access. The large latency arises partly due to the increased distance between the processor and remote memory over the interconnection network. In this paper, we develop a new switch architecture, called Switch MSHR (SMSHR), which provides t...

متن کامل

Re-evaluation of Fault Tolerant Cache Schemes

In general, fault tolerant cache schemes can be classified into 3 different categories, namely, cache line disabling, replacement with spare block, and decoder reconfiguration without spare blocks. This paper re-examines each of those fault tolerant techniques with a fixed typical size and organization of L1 cache, through extended simulation using SPEC2000 benchmark on individual techniques. T...

متن کامل

Scalable Hardware Mechanisms for Superscalar Processors

of the Dissertation Scalable Hardware Mechanisms for Superscalar Processors by Steven Daniel Wallace Doctor of Philosophy in Electrical and Computer Engineering University of California, Irvine, 1997 Professor Nader Bagherzadeh, Chair Superscalar processors fetch and execute multiple instructions per cycle. As more instructions can be executed per cycle, an accurate and high bandwidth instructi...

متن کامل

Reconciling Sharing and Spatial Locality Using Adjustable Block Size Coherent Caches

Several studies have shown that the performance of coherent caches depends on the relationship between the cache block size and the granularity of sharing and locality exhibited by the program. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000